Goto

Collaborating Authors

 soil property


Self-supervised and Multi-fidelity Learning for Extended Predictive Soil Spectroscopy

Sun, Luning, Safanelli, José L., Sanderman, Jonathan, Georgiou, Katerina, Brungard, Colby, Grover, Kanchan, Hopkins, Bryan G., Liu, Shusen, Bremer, Timo

arXiv.org Artificial Intelligence

We propose a self-supervised machine learning (SSML) framework for multi-fidelity learning and extended predictive soil spectroscopy based on latent space embeddings. A self-supervised representation was pretrained with the large MIR spectral library and the Variational Autoencoder algorithm to obtain a compressed latent space for generating spectral embeddings. At this stage, only unlabeled spectral data were used, allowing us to leverage the full spectral database and the availability of scan repeats for augmented training. We also leveraged and froze the trained MIR decoder for a spectrum conversion task by plugging it into a NIR encoder to learn the mapping between NIR and MIR spectra in an attempt to leverage the predictive capabilities contained in the large MIR library with a low cost portable NIR scanner. This was achieved by using a smaller subset of the KSSL library with paired NIR and MIR spectra. Downstream machine learning models were then trained to map between original spectra, predicted spectra, and latent space embeddings for nine soil properties. The performance of was evaluated independently of the KSSL training data using a gold-standard test set, along with regression goodness-of-fit metrics. Compared to baseline models, the proposed SSML and its embeddings yielded similar or better accuracy in all soil properties prediction tasks. Predictions derived from the spectrum conversion (NIR to MIR) task did not match the performance of the original MIR spectra but were similar or superior to predictive performance of NIR-only models, suggesting the unified spectral latent space can effectively leverage the larger and more diverse MIR dataset for prediction of soil properties not well represented in current NIR libraries.


Towards Autonomous In-situ Soil Sampling and Mapping in Large-Scale Agricultural Environments

Nguyen, Thien Hoang, Muller, Erik, Rubin, Michael, Wang, Xiaofei, Sibona, Fiorella, McBratney, Alex, Sukkarieh, Salah

arXiv.org Artificial Intelligence

Abstract-- Traditional soil sampling and analysis methods are labor-intensive, time-consuming, and limited in spatial resolution, making them unsuitable for large-scale precision agriculture. T o address these limitations, we present a robotic solution for real-time sampling, analysis and mapping of key soil properties. Our system consists of two main sub-systems: a Sample Acquisition System (SAS) for precise, automated in-field soil sampling; and a Sample Analysis Lab (Lab) for real-time soil property analysis. The system's performance was validated through extensive field trials at a large-scale Australian farm. Experimental results show that the SAS can consistently acquire soil samples with a mass of 50g at a depth of 200mm, while the Lab can process each sample within 10 minutes to accurately measure pH and macronutrients. These results demonstrate the potential of the system to provide farmers with timely, data-driven insights for more efficient and sustainable soil management and fertilizer application. I. INTRODUCTION Achieving sustainable agricultural resource management requires accurate, high-resolution, and up-to-date data on soil properties such as pH and macronutrients [1], [2]. However, conventional soil sampling and testing methods fail to address this need at scale.


Estimating properties of a homogeneous bounded soil using machine learning models

Kalimeris, Konstantinos, Mindrinos, Leonidas, Pallikarakis, Nikolaos

arXiv.org Artificial Intelligence

This work focuses on estimating soil properties from water moisture measurements. We consider simulated data generated by solving the initial-boundary value problem governing vertical infiltration in a homogeneous, bounded soil profile, with the usage of the Fokas method. To address the parameter identification problem, which is formulated as a two-output regression task, we explore various machine learning models. The performance of each model is assessed under different data conditions: full, noisy, and limited. Overall, the prediction of diffusivity $D$ tends to be more accurate than that of hydraulic conductivity $K.$ Among the models considered, Support Vector Machines (SVMs) and Neural Networks (NNs) demonstrate the highest robustness, achieving near-perfect accuracy and minimal errors.


LimeSoDa: A Dataset Collection for Benchmarking of Machine Learning Regressors in Digital Soil Mapping

Schmidinger, J., Vogel, S., Barkov, V., Pham, A. -D., Gebbers, R., Tavakoli, H., Correa, J., Tavares, T. R., Filippi, P., Jones, E. J., Lukas, V., Boenecke, E., Ruehlmann, J., Schroeter, I., Kramer, E., Paetzold, S., Kodaira, M., Wadoux, A. M. J. -C., Bragazza, L., Metzger, K., Huang, J., Valente, D. S. M., Safanelli, J. L., Bottega, E. L., Dalmolin, R. S. D., Farkas, C., Steiger, A., Horst, T. Z., Ramirez-Lopez, L., Scholten, T., Stumpf, F., Rosso, P., Costa, M. M., Zandonadi, R. S., Wetterlind, J., Atzmueller, M.

arXiv.org Artificial Intelligence

Digital soil mapping (DSM) relies on a broad pool of statistical methods, yet determining the optimal method for a given context remains challenging and contentious. Benchmarking studies on multiple datasets are needed to reveal strengths and limitations of commonly used methods. Existing DSM studies usually rely on a single dataset with restricted access, leading to incomplete and potentially misleading conclusions. To address these issues, we introduce an open-access dataset collection called Precision Liming Soil Datasets (LimeSoDa). LimeSoDa consists of 31 field- and farm-scale datasets from various countries. Each dataset has three target soil properties: (1) soil organic matter or soil organic carbon, (2) clay content and (3) pH, alongside a set of features. Features are dataset-specific and were obtained by optical spectroscopy, proximal- and remote soil sensing. All datasets were aligned to a tabular format and are ready-to-use for modeling. We demonstrated the use of LimeSoDa for benchmarking by comparing the predictive performance of four learning algorithms across all datasets. This comparison included multiple linear regression (MLR), support vector regression (SVR), categorical boosting (CatBoost) and random forest (RF). The results showed that although no single algorithm was universally superior, certain algorithms performed better in specific contexts. MLR and SVR performed better on high-dimensional spectral datasets, likely due to better compatibility with principal components. In contrast, CatBoost and RF exhibited considerably better performances when applied to datasets with a moderate number (< 20) of features. These benchmarking results illustrate that the performance of a method is highly context-dependent. LimeSoDa therefore provides an important resource for improving the development and evaluation of statistical methods in DSM.


A text-based, generative deep learning model for soil reflectance spectrum simulation in the VIS-NIR (400-2499 nm) bands

Lei, Tong, Bailey, Brian N.

arXiv.org Artificial Intelligence

Simulating soil reflectance spectra is invaluable for soil-plant radiative modeling and training machine learning models, yet it is difficult as the intricate relationships between soil structure and its constituents. To address this, a fully data-driven soil optics generative model (SOGM) for simulation of soil reflectance spectra based on soil property inputs was developed. The model is trained on an extensive dataset comprising nearly 180,000 soil spectra-property pairs from 17 datasets. It generates soil reflectance spectra from text-based inputs describing soil properties and their values rather than only numerical values and labels in binary vector format. The generative model can simulate output spectra based on an incomplete set of input properties. SOGM is based on the denoising diffusion probabilistic model (DDPM). Two additional sub-models were also built to complement the SOGM: a spectral padding model that can fill in the gaps for spectra shorter than the full visible-near-infrared range (VIS-NIR; 400 to 2499 nm), and a wet soil spectra model that can estimate the effects of water content on soil reflectance spectra given the dry spectrum predicted by the SOGM. The SOGM was up-scaled by coupling with the Helios 3D plant modeling software, which allowed for generation of synthetic aerial images of simulated soil and plant scenes. It can also be easily integrated with soil-plant radiation model used for remote sensin research like PROSAIL. The testing results of the SOGM on new datasets that not included in model training proved that the model can generate reasonable soil reflectance spectra based on available property inputs. The presented models are openly accessible on: https://github.com/GEMINI-Breeding/SOGM_soil_spectra_simulation.


Soil Fertility Prediction Using Combined USB-microscope Based Soil Image, Auxiliary Variables, and Portable X-Ray Fluorescence Spectrometry

Dasgupta, Shubhadip, Pate, Satwik, Rathore, Divya, Divyanth, L. G., Das, Ayan, Nayak, Anshuman, Dey, Subhadip, Biswas, Asim, Weindorf, David C., Li, Bin, Silva, Sergio Henrique Godinho, Ribeiro, Bruno Teixeira, Srivastava, Sanjay, Chakraborty, Somsubhra

arXiv.org Artificial Intelligence

This study explored the application of portable X-ray fluorescence (PXRF) spectrometry and soil image analysis to rapidly assess soil fertility, focusing on critical parameters such as available B, organic carbon (OC), available Mn, available S, and the sulfur availability index (SAI). Analyzing 1,133 soil samples from various agro-climatic zones in Eastern India, the research combined color and texture features from microscopic soil images, PXRF data, and auxiliary soil variables (AVs) using a Random Forest model. Results indicated that integrating image features (IFs) with auxiliary variables (AVs) significantly enhanced prediction accuracy for available B (R^2 = 0.80) and OC (R^2 = 0.88). A data fusion approach, incorporating IFs, AVs, and PXRF data, further improved predictions for available Mn and SAI with R^2 values of 0.72 and 0.70, respectively. The study demonstrated how these integrated technologies have the potential to provide quick and affordable options for soil testing, opening up access to more sophisticated prediction models and a better comprehension of the fertility and health of the soil. Future research should focus on the application of deep learning models on a larger dataset of soil images, developed using soils from a broader range of agro-climatic zones under field condition.


Deep-Learning Framework for Optimal Selection of Soil Sampling Sites

Pham, Tan-Hanh, Acharya, Praneel, Bachina, Sravanthi, Osterloh, Kristopher, Nguyen, Kim-Doang

arXiv.org Artificial Intelligence

This work leverages the recent advancements of deep learning in image processing to find optimal locations that present the important characteristics of a field. The data for training are collected at different fields in local farms with five features: aspect, flow accumulation, slope, NDVI (normalized difference vegetation index), and yield. The soil sampling dataset is challenging because the ground truth is highly imbalanced binary images. Therefore, we approached the problem with two methods, the first approach involves utilizing a state-of-the-art model with the convolutional neural network (CNN) backbone, while the second is to innovate a deep-learning design grounded in the concepts of transformer and self-attention. Our framework is constructed with an encoder-decoder architecture with the self-attention mechanism as the backbone. In the encoder, the self-attention mechanism is the key feature extractor, which produces feature maps. In the decoder, we introduce atrous convolution networks to concatenate, fuse the extracted features, and then export the optimal locations for soil sampling. Currently, the model has achieved impressive results on the testing dataset, with a mean accuracy of 99.52%, a mean Intersection over Union (IoU) of 57.35%, and a mean Dice Coefficient of 71.47%, while the performance metrics of the state-of-the-art CNN-based model are 66.08%, 3.85%, and 1.98%, respectively. This indicates that our proposed model outperforms the CNN-based method on the soil-sampling dataset. To the best of our knowledge, our work is the first to provide a soil-sampling dataset with multiple attributes and leverage deep learning techniques to enable the automatic selection of soil-sampling sites. This work lays a foundation for novel applications of data science and machine-learning technologies to solve other emerging agricultural problems.


Rapid detection of soil carbonates by means of NIR spectroscopy, deep learning methods and phase quantification by powder Xray diffraction

Chiniadis, Lykourgos, Tamvakis, Petros

arXiv.org Artificial Intelligence

Soil near-Infrared (NIR) spectral absorbance/reflectance libraries are utilized towards improving agricultural production and analysis of soil properties which are key prerequisite for agro-ecological balance and environmental sustainability. Carbonates in particular, represent a soil property which is mostly affected even by mild, let alone extreme, changes of environmental conditions during climate change. In this study we propose a rapid and efficient way to predict carbonates content in soil by means of Fourier Transform Near-Infrared (FT-NIR) reflectance spectroscopy and by use of deep learning methods. We exploited multiple machine learning methods, such as: 1) a Multi-Layered Perceptron Regressor (MLP) and 2) a Convolutional Neural Network (CNN) and compare their performance with other traditional machine learning algorithms such as Partial Least Squares Regression (PLSR), Cubist and Support Vector Machines (SVM) on the combined dataset of two NIR spectral libraries: Kellogg Soil Survey Laboratory (KSSL) of the United States Department of Agriculture (USDA), a dataset of soil samples reflectance spectra collected nationwide, and Land Use and Coverage Area Frame Survey (LUCAS) TopSoil (European Soil Library) which contains soil sample absorbance spectra from all over the European Union, and use them to predict carbonate content on never-before-seen soil samples. Soil samples in KSSL and in TopSoil spectral libraries were acquired in the spectral region of visible-near infrared (Vis-NIR) (350-2500 nm), however in this study, only the NIR spectral region (1150-2500 nm) was utilized. Quantification of carbonates by means of X-ray-Diffraction is in good agreement with the volumetric method and the MLP prediction. Our work contributes to rapid carbonates content prediction in soil samples in cases where: 1) no volumetric method is available and 2) only NIR spectra absorbance data are available. Up till now and to the best of our knowledge, there exists no other study, that presents a prediction model trained on such an extensive dataset with such promising results on unseen data, undoubtedly supporting the notion that deep learning models present excellent prediction tools for soil carbonates content.


A deep scalable neural architecture for soil properties estimation from spectral information

Piccoli, Flavio, Rossini, Micol, Colombo, Roberto, Schettini, Raimondo, Napoletano, Paolo

arXiv.org Artificial Intelligence

In this paper we propose an adaptive deep neural architecture for the prediction of multiple soil characteristics from the analysis of hyperspectral signatures. The proposed method overcomes the limitations of previous methods in the state of art: (i) it allows to predict multiple soil variables at once; (ii) it permits to backtrace the spectral bands that most contribute to the estimation of a given variable; (iii) it is based on a flexible neural architecture capable of automatically adapting to the spectral library under analysis. The proposed architecture is experimented on LUCAS, a large laboratory dataset and on a dataset achieved by simulating PRISMA hyperspectral sensor. 'Results, compared with other state-of-the-art methods confirm the effectiveness of the proposed solution.


Predicting and Mapping of Soil Organic Carbon Using Machine Learning Algorithms in Northern Iran

Emadi, Mostafa, Taghizadeh-Mehrjardi, Ruhollah, Cherati, Ali, Danesh, Majid, Mosavi, Amir, Scholten, Thomas

arXiv.org Machine Learning

Estimation of the soil organic carbon content is of utmost importance in understanding the chemical, physical, and biological functions of the soil. This study proposes machine learning algorithms of support vector machines, artificial neural networks, regression tree, random forest, extreme gradient boosting, and conventional deep neural network for advancing prediction models of SOC. Models are trained with 1879 composite surface soil samples, and 105 auxiliary data as predictors. The genetic algorithm is used as a feature selection approach to identify effective variables. The results indicate that precipitation is the most important predictor driving 15 percent of SOC spatial variability followed by the normalized difference vegetation index, day temperature index of moderate resolution imaging spectroradiometer, multiresolution valley bottom flatness and land use, respectively. Based on 10 fold cross validation, the DNN model reported as a superior algorithm with the lowest prediction error and uncertainty. In terms of accuracy, DNN yielded a mean absolute error of 59 percent, a root mean squared error of 75 percent, a coefficient of determination of 0.65, and Lins concordance correlation coefficient of 0.83. The SOC content was the highest in udic soil moisture regime class with mean values of 4 percent, followed by the aquic and xeric classes, respectively. Soils in dense forestlands had the highest SOC contents, whereas soils of younger geological age and alluvial fans had lower SOC. The proposed DNN is a promising algorithm for handling large numbers of auxiliary data at a province scale, and due to its flexible structure and the ability to extract more information from the auxiliary data surrounding the sampled observations, it had high accuracy for the prediction of the SOC baseline map and minimal uncertainty.